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ABSTRACT 



A rate control algorithm for an MPEG-2 compliant encoder 
is described. The rate control algorithm has embodiments 
useful for constant bit rate and variable bit rate encoding. In 
particular, the invention relates to adaptive quantization. 
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FIG. 2 A 
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BACKGROUND OF THE INVENTION 
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ADAPTIVE QUANTIZATION 

FIELD OF THE INVENTION 

The present invention relates to a method for encoding 

video images. In particular, the present invention relates to 

a method for performing rate control for a video encoder. 

The inventive rate control technique has an embodiment for 

constant bit rate encoding and an embodiment for variable 

bit rate encoding. 

to 10 

In particular, the invention relates to adaptive quantiza- 
tion. 

RELATED CASES 

The following cases have been filed on even date 15 
herewith, are assigned to the assignee hereof and contain 
subject matter of the present application. 

1. U.S. patent application Ser. No. 08/578,231, entitled 
"Quantization Biased, Activity Based, Inter/Intra Decision' 
and filed for K. Metin Uz and Aaron Wells. 

2. U.S. patent application Ser. No. 08/578,813, entitled 
"Three Stage Hierarchial Motion Vector Detenrrination" and 
filed for Didier J. LeGall. 

3. U.S. patent application Ser. No. 08/578,230, entitled 2 5 
"Field Frame Macroblock Encoding Decision" and filed for 
Didier J. LeGall. 

4. U.S. patent application Ser. No. 08/578,230, entitled 
"Scene Change Detection" and filed for Didier J. LeGall. 

5. U.S. patent application Ser. No. 08/578,228, entitled 30 
"Fade Detection" and filed for Didier J. LeGall. 

6. U.S. patent application Ser. No. 08/578,229, entitled 
<c Rate Control with Panic Mode" filed for Aaron Wells. 

7. U.S. patent application Ser. No. 08/578,811, entitled 35 
"Statistical Multiplexing" filed for K. Metin Uz, Aaron 
Wells, and Didier J. LeGall. 

8. U.S. patent application Ser. No. 08/578227, entitled 
'Variable Bit Rate Encoding" filed for Didier J. LeGall, K. 
Metin Uz, and Aaron Wells. 40 

9. U.S. patent application Ser. No. 08/578,812, entitled 
"Video Encoder Timing Method" filed for Didier J. LeGall, 
K. Metin Uz, and Aaron Wells. 
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In a preferred embodiment of the invention, the video 
encoder is an MPEG-2 compliant encoder. The encoder 
receives a sequence of frames from a video source. The 
sequence of frames may be progressive or interlaced. 5Q 
Illustratively, the progressive sequence comprises 30 frames 
per second. In the case of an interlaced sequence, each frame 
comprises two fields. A top field comprises the even num- 
bered rows and a bottom field comprises the odd numbered 
rows. Thus, in the case of an interlaced sequence, there are 
60 fields per second. 

The video source may be any source of a digital video 
signal such as a video camera or a telecine machine. A 
telecine machine converts a film comprising 24 frames per 
second into a 60 field per second digital video signal using 
3:2 pull down. The 3:2 pull down technique provides for 
generating two video fields and three video fields for alter- 
nating film frames. For a film frame which is converted into 
three video fields, the third field is a repeat of the first field. 

The video encoder utilizes a compression algorithms to 65 
generate an MPEG-2 compliant bit stream from the input 
sequence of frames. (See ISO/IEC 13818-2) 



The MPEG-2 bit stream has six layers of syntax. There are 
a sequence layer (random access unit, context), Group of 
Pictures layer (random access unit, video coding), picture 
layer (primary coding layer), slice layer (resynchronization 
unit), macroblock (motion compensation unit) and block 
layer (DCT unit). A group of pictures (GOP) is a set of 
frames which starts with an I-frame and includes a certain 
number of P and B frames. The number of frames in a GOP 
may be fixed or may be variable. Each frame is divided into 
macroblocks. Illustratively, a macroblock comprises four 
luminance blocks and two chrominance blocks. Each block 
is 8x8 pixels. 

The encoder distinguishes between three kinds of frames 
(or pictures), I, P, and B. Typically, the coding of I frames 
results in the most bits. In an I-frame, each macroblock is 
coded as follows. Each 8x8 block of pixels in a macroblock 
undergoes a DCT (discrete cosine transform) transform to 
form a 8x8 array of transform coefficients. The transform 
coefficients are then quantized with a variable quantizer 
matrix. Quantization involves dividing each DCT coefficient 
F[v][u] by a quantizer step size. The quantizer step size for 
each AC DCT coefficient is determined by the product of a 
weighting matrix element W[v] [u] and a quantization scale 
factor (also known as mquant). As is explained below, in 
some cases the quantization scale factor Q„ for a macroblock 
n is a product of a rate control quantization scale factor Qj* 
and a masking activity quantization scale factor (QS„). 
However, this factorization of the quantization scale factor 
Q„ is optional. The use of a quantization scale factor permits 
the quantization step size for each AC DCT coefficient to be 
modified at the cost of only a few bits. The quantization 
scale factor is selected for each macroblock. 

The resulting quantized DCT coefficients are scanned 
(e.g., using zig-zag scanning) to form a sequence of DCT 
coefficients. The DCT coefficients are then organized into 
run-level pairs. The run-level pairs are then encoded using a 
variable length code (VLC). In an I-frame, each macroblock 
is encoded according to this technique. 

In a P-frame, a decision is made to code each macroblock 
as an I macroblock, which macroblock is then encoded 
according to the technique described above, or to code the 
macroblock as a P macroblock. For each P macroblock, a 
prediction of the macroblock in a previous video frame is 
obtained. The predication is identified by a motion vector 
which indicates the translation between the macroblock to 
be coded in the current frame and its prediction in the 
previous frame. (A variety of block matching algorithms can 
be used to find the particular macroblock in the previous 
frame which is the best match with the macroblock to be 
coded in the current frame. This "best match" macroblock 
becomes the prediction for the current macroblock) The 
predictive error between the predictive macroblock and the 
current macroblock is then coded using the DCT, 
quantization, zig-zig scanning, run-level pair encoding, and 
VLC encoding. 

In the coding of a B-frame, a decision has to be made as 
to the coding of each macroblock. The choices are (a) 
intracoding (as in an I macroblock), (b) unidirectional for- 
ward predictive coding using a previous frame to obtain a 
motion compensated prediction, (c) unidirectional backward 
predictive coding using a subsequent frame to obtain a 
motion compensated prediction, and bidirectional predictive 
coding, wherein a motion compensated prediction is 
obtained by interpolating a backward motion compensated 
prediction and a forward motion compensated prediction. In 
the cases of forward, backward, and bidirectional motion 
compensated prediction, the predictive error is encoded 
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using DCT, quantization, zig-zig scanning, run-level pair 
encoding and VLC encoding. 

The P frame may be predicted from an I frame or another 
P frame. The B frame may also be predicted from an I frame 
or a P frame. No predictions are made from B frames. 5 

B frames have the smallest number of bits when encoded, 
then P frames, with I frames having the most bits when 
encoded. Thus, the greatest degree of compression is 
achieved for B frames. For each of the I, B, and P frames, 
the number of bits resulting from the encoding process can 10 
be controlled by controlling the quantizer step size (adaptive 
quantization) used to code each macroblock. A macroblock 
of pixels or pixel errors which is coded using a large 
quantizer step size results in fewer bits than if a smaller 
quantizer step size is used. 

After encoding by the video encoder, the bit stream is 
stored in an encoder output buffer. Then, the encoded bits are 
transmitted via a channel to a decoder, where the encoded 
bits are received in a buffer of the decoder, or the encoded 
bits may be stored in a storage medium. 

The order of the frames in the encoded bit stream is the 
order in which the frames are decoded by the decoder. This 
may be different from the order in which the frames arrived 
at the encoder. The reason for this is that the coded bit stream 
contains B frames. In particular, it is necessary to code the 
I and P frames used to anchor a B frame before coding the 
B frame itself. 

Consider the following sequence of frames received at the 
input of a video encoder and the indicated coding type (I, P 3Q 
or B) to be used to code each frame: 

1 2 3 4 5 67 8 9 10 11 12 13 
IBBPBBPBBIBBP 

For this example there are two B-frames between succes- 
sive coded P-frames and also two B-frames between sue- 35 
cessive coded I- and P-frames. Frames "II" is used to from 
a prediction for frame "4P, and frames "II" and "4P" are 
both used to form predictions for frames "2B" and "3B". 
Therefore, the order of coded frames in the coded sequence 
shall be "IF, "4P'\ "2B", "3B". 40 
Thus, at the encoder output, in the coded bit stream, and at 
the decoder input, the frames are reordered as follows: 

1 4 2 3 7 5 6 10 8 9 13 11 12 

IPBBPBBIBBPBB 

In the case of interlaced video the following applies. Each 45 
frame of interlaced video consists of two fields. The 
MPEG-2 specification allows the frame to be encoded as a 
frame picture or the two fields to be encoded as two field 
pictures. Frame encoding or field encoding can be adap- 
tively selected on a frame-by-frame basis. Frame encoding 50 
is typically preferred when the video scene contains signifi- 
cant detail with limited motion. Field encoding, in which the 
second field can be predicted from the first, works better 
when there is the fast movement. 

For field prediction, predictions are made independently 55 
for the macroblocks of each field by using data from one or 
more previous fields (P field) or previous and subsequent 
fields (B field). For frame prediction, predictions are made 
for the macroblocks in a frame from a previous frame (P 
frame) or from a previous and subsequent frame (B frame). 60 
Within a field picture, all predictions are field predictions. 
However, in a frame picture either field prediction or frame 
prediction may be selected on a macroblock by macroblock 
basis. 

An important aspect of any video encoder is rate control. 65 
The purpose of rate control is to maximize the perceived 
quality of the encoded video when it is decoded at a decoder 



by intelligently allocating the number of bits used to encode 
each frame and each macroblock within a frame. Note the 
encoder may be a constant bit rate (CBR) encoder or a 
variable bit rate (VBR) encoder. In the case of constant bit 
rate encoder, the sequence of bit allocations to successive 
frames ensures that an assigned channel bit rate is main- 
tained and that decoder buffer exceptions (overflow or 
underflow of decoder buffer) are avoided. In the case of a 
VBR encoder, the constraints are reduced. It may only be 
necessary to insure that a maximum channel rate is not 
exceeded so as to avoid decoder buffer underflow. 

In order to prevent a decoder buffer exception, the 
encoder maintains a model of the decoder buffer. This model 
maintained by the encoder is known as the video buffer 
verifier (VBV) buffer. The VBV buffer models the decoder 
buffer occupancy. Depending on the VBV occupancy level, 
the number bits which may be budgeted for a particular 
frame may be increased or decreased to avoid a decoder 
buffer exception. 

It is an object of the present invention to provide a rate 
control technique for an MPEG-2 compliant encoder. 

Specifically, it is an object of the invention to provide a 
rate control technique for a constant bit rate, real time 
MPEG-2 compliant encoder. 

It is also an object of the invention to provide a rate 
control technique for a variable bit rate, non-real time 
MPEG-2 compliant encoder. 

SUMMARY OF THE INVENTION 

A. Hardware Overview 

In accordance with a preferred embodiment of the 
invention, a video encoder includes a preprocessing unit, a 
master unit, and a zero or more of slave units. Each master 
or slave unit includes a motion estimation unit, and an 
encoder unit for performing DCT, quantization, zig-zag 
scan, run-level pair encoding, and VLC encoding. Each 
master or slave unit is used to encode a section of a picture 
which, for example, is four macroblocks high. Each master 
or slave unit has allocated to it a portion of a physical 
encoder output buffer. 

Each master or slave unit also has a controller. The 
controllers of the slave units communicate with and are 
controlled by the controller of the master unit. The control- 
lers in the master and slave units and the preprocessing unit 
cooperate to perform rate control for the encoder. 

B. Rate Control Overview 

In accordance with the present invention, a preferred rate 
control algorithm has the following features: 

(1) A bit budget BB, is established for each frame i by 
allocating the total available coding rate R^ to each 
frame i based on the number of bits used to code the 
previous frame of the same type and the average 
quantization scale factor used to code the previous 
frame of the same type relative to the bits used and 
average quantization scale factor for the previous 
frames of the other types and the relative frequency of 
each frame type. 

(2) The bit budget for each frame is allocated to the 
individual sections of the frame coded by the individual 
master or slave units based on a complexity measure 
for each section. 

(3) The bit budget for each section is then allocated to 
each macroblock in the section based on a total activity 
measure for the macroblock. (A description of the total 
activity measure is provided below). 

(4) Virtual buffers v 7 , Vp, and V s , corresponding to frame 
types L B and P provide rate control feedback by 
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adjusting the quantization scale factor. A rate control 
quantization scale factor Q n R for a macroblock n in 
frame i is determined as a function of a ratio of virtual 
buffer fullness to virtual buffer size. 

(5) A masking activity is determined for each macroblock 5 
which measures the amount of visual local masking in 
the macroblock. The rate control quantization scale 
factor determined from virtual buffer fullness is mul- 
tiplied by a masking activity quantization scale factor 
which is dependent on the macroblock masking activity 10 
to obtain a total quantization scale factor. 

(6) The bit budget BB t for a current frame i is increased 
or decreased based on the VBV buffer occupancy level 
to prevent VBV buffer overflow or underflow. 

(7) The rate control may initiate a panic mode. A panic 
mode arises when a scene is encountered which gen- 
erates too many bits, even when the quantization scale 
factor is set to the maximum size. In this case the 
encoder is in danger of generating too many bits for the 2Q 
channel to transfer to the decoder, thereby causing a 
"VBV underflow" bit stream error. In this case, the 
encoder enters the panic mode in which quality is 
sacrificed to guarantee a legal bit stream. 

(8) The rate control algorithm takes into account changes 25 
in the effective coding rate R^ For a CBR encoder, the 
rate R^may change because a particular encoder may 
be sharing a channel with a number of other encoders. 

A statistical multiplexing controller may change the 
fraction of the channel bandwidth allocated to the 30 
particular encoder. For a VBR encoder, the effective 
encoding rate R^ will change at various points in the 
bit stream The changes in rate are accounted for in 
VBV buffer enforcement. 

(9) The rate control algorithm also accounts for inverse 35 
telecine processing by the encoder when allocating bit 
budgets to particular frames. Inverse telecine process- 
ing involves detecting and skipping repeated fields in a 
field sequence outputted by a telecine machine to the 
encoder. In particular, the effective frame rate f^ is 40 
given by 



45 



where T f is the average number of fields in a frame, and 
f is the nominal frame rate (as specified in a sequence 
header). 

(10) The encoder can detect scene changes. The rate 
control algorithm is modified as a result of scene 50 
changes. In particular, a new GOP is started when a 
scene change is detected. Default values are used to 
allocate bits to the first I, P, and B frames in the new 
scene. The default value for the I frame depends on bit 
rate and VBV fullness and frame activity. The default 55 
values for the P and B frames are determined from the 
I frame default value. In addition, the initial quantiza- 
tion scale factor used in the first macroblock of the first 
frame of each type in the new scene is a function of the 
bit budget for the frame and the total activity of the 60 
particular frame. The total activity for a frame is the 
sum of the total activities for the macroblocks in the 
frame. The initial rate control quantizer scale factor for 
a frame of each type (I,P,B) is used to determine the 
initial occupancies of the corresponding virtual buffers 65 
V/, Vp, v^. These occupancies are then updated to obtain 
subsequent rate control quantization scale factors. 



(11) The encoder can detect fades (fade to black or fade 
to white) and account for a fade in the rate control 
algorithm. 

The MPEG-2 compliant encoding technique has sev- 
eral other important features useful for generating an 
MPEG-2 compliant bit stream. 

(1) An inter/intra decision is made for each macroblock in 
a P or B frame. An intra-bias (IB) used in the decision 
takes into account the quantization scale factor for the 
macroblock. 

(2) A motion vector is selected for each macroblock to be 
inter-coded. In the case of an interlaced sequence, it is 
desirable to pick between a frame-based motion vector 
and a plurality of field-based motion vectors. A three 
stage hierarchical procedure is provided to obtain a 
motion vector for each macroblock to be inter-coded. 

(3) For each macroblock in a frame which utilizes frame 
based encoding, a decision is made whether to use 
field-or frame-based encoding (DCT, quantization, 
etc.). The present invention makes the field/frame 
encoding decision for each macroblock based on com- 
paring (a) the total activity of the frame macroblock 
and the (b) sum of the total activities of the two field 
macroblocks. (A macroblock in an interlaced frame 
may be viewed as comprising macroblocks in each of 
the fields which comprise the frame. Each such field 
macroblock contributes half the rows to the frame 
macroblock.) The smaller of (a) and (b) determines 
which mode to use. 

The above-described rate control technique may be used 
as part of a single pass real time constant bit rate encoding 
process. 

C. Variable Bit Rate Encoding Rate Control 
In some cases it may be desirable to use variable bit rate 
(VBR) encoding. In particular, it may be desirable to provide 
for non-real time variable bit rate encoding. Such encoding 
may utilize multiple encoding passes. The variable rate 
encoding of a sequence of frames (or pictures) proceeds as 
follows: 

(1) In a first coding pass, VBV enforcement is disabled. 
In addition, in the first pass, the rate control quantiza- 
tion scale factor is maintained as fixed. However, the 
masking activity quantization scale factor is allowed to 
vary for different macroblocks. 

(2) From step (1) a number of bits used to encode each 
frame in the input sequence in the first encoding pass is 
determined. Then, a bit budget for each frame in the 
sequence is determined from the number of bits used to 
encode each frame in the first pass such that (a) an 
overall target for the number of bits used to code the 



entire frame sequence is not exceeded, and (b) R 



rnax* 



maximum channel bit rate, is not violated. To accom- 
plish this, the bit budget for each frame is modified so 
that the VBV buffer does not underflow. It is not 
necessary to worry about VBV overflow for a VBR 
encoder. 

(3) The input sequence is then coded again in a second 
pass using the bit budgets determined in step (2). There 
is no VBV enforcement during the second encoding 
pass, as any possible VBV underflow has been 
accounted for as indicated in step (2). Instead, the 
cumulative coding budget deviation (CE,) is main- 
tained. This means that there is accumulated over the 
successive frames that are coded the difference between 
the bit budget, BB £ , for each frame and BU f , the actual 
number of bits used to code the frame. Therefore, for 
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frame i, CE^CE^+BUr-BB,. The budget BB l+m for 
frame i+1 is modified by an amount proportional to the 
cumulative budget deviation CE,. In the second pass, 
the rate control quantization scale factor is not neces- 
sarily fixed and may vary in response to virtual buffer 
fullness. 

The foregoing variable bit rate technique may be modified 
by adding a zero pass (i.e. a pass prior to the first pass) in 
which the encoder performs scene detection, fade detection, 
and inverse telecine processing. In the zero pass, total 
activity and masking activity may be calculated for each 
macroblock. In addition, the first two stages of the three 
stage motion estimation hierarchy may be performed in the 
zero pass. This enables certain motion estimation statistics to 
be obtained which are useful for scene detection. 

BRIEF DESCRIPTION OF THE DRAWING 

FIG. 1A schematically illustrates an architecture of an 
encoder which may be utilized to generate an MPEG-2 
compliant bit stream, in accordance with the invention. 

FIG. IB illustrates a master or slave unit for use in the 
encoder of FIG. 1A. 

FIG. 2A illustrates a frame based macroblock. 

FIG. 2B illustrates a field based macroblock. 

FIG. 3 is a plot which indicates how an inter/intra coding 
decision for a macroblock is to be made. 

FIG. 4A illustrates field based motion vectors. 

FIG. 4B is a flow chart for a three stage hierarchical 
motion estimation algorithm. 

FIG. 5 illustrates the dependence of a rate control quan- 
tization scale factor for a macroblock on total activity and bit 
budget after a scene change is detected. 

FIG. 6 illustrates a fade to black and a fade to white. 

FIG. 7 illustrates an end-to-end encoder and decoder 
system. 

FIG. 7A illustrates a statistical multiplexing system in 
which a plurality of encoders of the type shown in FIG. 1A 
communicate with a central control unit. 

FIGS. 7B and 7C are flow charts of algorithms carried out 
by the central controller of FIG. 7A. 

FIG. 8 is a flow chart for rate control for non-real time, 
multiple pass, variable bit rate encoding. 

FIG. 8A shows how the bit budget for each frame in the 
variable bit rate encoding process is adjusted to prevent 
VBV underflow. 

FIG. 9 is a flow chart of an overall implementation of an 
encoder according to the invention. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The following is an outline of the sections which appear 
in the Detailed Description of the Invention: 

A. Encoder Architecture 

B. Total Activity, Masking Activity 

C. Quantization Biased, Activity Based, Inter/intra Deci- 
sion 

D. Three Stage Hierarchial Motion Vector Determination 

E. Field/Frame Macroblock Encoding Decision 

F. Scene Detection 

G. Fade Detection 

H. Rate Control 

H.l Bit Budget Determination 
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H.2 Virtual Buffers for Determining Quantization Level 

H.3 Video Buffer Verifier 

H.4 VBV Underflow Protection and Panic 

H.5 VBV Overflow Protection 

H.6 Adaptive Quantization 

H.7 Statistical Multiplexing and Rate Changes 

H.8 Inverse Telecine Processing 

L Rate Control for Variable Bit Rate Encoder 

J. Overall Encoder Rate Control Implementation 



A. Encoder Architecture 

An encoder which can carry out the inventive rate control 
and encoding algorithm of the present invention is illustrated 

15 in FIG. 1. The encoder 10 of FIG. 1 comprises a prepro- 
cessing unit 20, a master unit 30, and a plurality of slave 
units 40-1, 40-2, . . . , 40-N. The encoder also includes a 
physical encoder output buffer 50 which is connected to a 
channel 5 1 for transmitting the encoded bit stream. 

20 The architecture of the master unit 30 and a slave unit 40 
is illustrated schematically in FIG. IB. Each unit 30,40 
comprises a motion estimation unit 70 for determining 
motion vectors, an encoding unit 80 for performing DCT, 
quantization, run-level pair encoding, and variable length 

25 encoding. A portion 50* of the physical encoder buffer 50 is 
allocated to each master or slave unit 30,40. In addition, 
each unit 30,40 includes a controller 90 in communication 
with the motion estimation unit 70, encoder unit 80 and 
physical buffer 50. The controllers in the slave units operate 

30 under the control of the controller in the master unit. 

In the preprocessing unit, the incoming video sequence is 
captured and buffered for reordering. The preprocessing unit 
determines a total activity and a masking activity for each 
macroblock in each frame in the input sequence. The total 

35 activity is used for buffer management and the masking 
activity is used for adaptive quantization. 

Each unit (master 30 or slave 40) encodes a section of a 
frame. Each section illustratively comprises four rows of 
macroblocks. The motion vectors for the macroblocks in 

40 each section are determined by the motion estimation units 
in the corresponding master or slave unit. A three stage 
hierarchical motion estimation algorithm is used to find a 
motion vector. The first two stages of the three stage 
algorithm are run one frame in advance of actual encoding 

45 and may be done simultaneously with the activity calcula- 
tions of the preprocessor. 

Certain motion estimation results and the activities are 
provided to the controller in the master unit for scene change 
detection, fade detection and the like. 

50 The rate control algorithm is carried out by the controller 
in the master together with the controllers in the slave units. 

B. Total Activity, Masking Activity 

The preprocessing unit of the inventive encoder deter- 
mines two activity measures for each macroblock for each 

55 frame to be encoded. 

Specifically, for each macroblock, there is determined a 
masking activity which measures the amount of local visual 
masking and a total activity which is used to determine the 
number of bits with which the macroblock is encoded. 

60 For each block b k in the frame, a frame activity measure 
fx k is determined. In the case of interlaced video, half of the 
rows of the block are in one field and half of the rows are in 
the other field. Two field activity measures for the block b^ 
are computed, where fil^ is the field activity of the portion 

65 of the block b k in the first field and 112* is the field activity 
of the portion of the block b k in the second field. 
Illustratively, a macroblock is 16x16 pixels and a block is 
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8x8 pixels. In the case of interlaced video, each 8x8 pixel 
block comprises an 8x4 block in the first field and an 8x4 
block in the second field. 

The frame activity or field activity for a block is deter- 
mined by summing the absolute differences of horizontal 5 
pixel pairs and vertical pixel pairs. FIG. 2A shows an 8x8 
block in a macro-block in a frame. The frame activity 
measure for this block is the sum of the absolute differences 
of all the horizontal pixel pairs (e.g. pair a and pair b) and 
the absolute differences of all the vertical pixel pairs (e.g. 10 
pair c and pair d). FIG. 2B shows a block in one of the fields 
that comprise the frame. This "field block" is 8x4 pixels as 
alternate rows of pixels are in the other field. The field 
activity measure for this block is the sum of the absolute 
differences of all the horizontal pixel pairs (e.g. pair e and 15 
pair f) and the absolute differences of all the vertical pixel 
pairs (e.g. pair g and pair h). 

Then for each macroblock n in the frame, there is com- 
puted the total activity ta„ and the masking activity ma„, 
where 20 



ta„ — mm 



( 



2 jh Z jrl k + 2 Jr2 k ) 



ma n - min k (fr^ J5l t 4- fi2^) 
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The minimum is over 12 luma blocks, 4 that comprise the 
macroblock n and 8 surrounding blocks. 

A total activity TA, for a frame i may be obtained by 
summing the total activities for all the macroblocks in a 
frame. 30 

C. Quantization Biased, Activity Biased, Inter/Intra Deci- 
sion 

The inter/intra decision is the decision between coding a 
macroblock in a P or B frame with prediction from a 
reference frame (inter-coding) or without prediction (intra- 35 
coding). The decision is taken to give the most effective 
encoding (best quality). In the inventive technique, the 
decision is biased against intra-coding as the quantization 
scale factor Q„ increases for the macroblock n. 

The total activity ta„ of a macroblock n has been defined 40 
above. This activity is a measure of the intra-activity (IA„) 
of the macroblock. 

The displaced frame difference activity (DFDA) is the 
total activity (calculated in the manner described above in 
section B) of the macroblock of pixel error values resulting 45 
from subtracting the motion compensationed prediction of a 
macroblock from the macroblock to be encoded. 

The Intra-bias (IB) of a macroblock to be encoded is for 
example given by 

50 

IB=f{Q n , M„X0«xM rt yconstant 

where, Q„ is the quantization scale factor to be used to 
code the macroblock 55 

The inter/intra decision for the macroblock n is deter- 
mined as follows: 

If IA n <threshold, use intra-coding. 

If DFDA„>IA„+IB re , use intra-coding. 

Otherwise, use inter-coding 60 
FIG. 3 plots IA on the horizontal axis and DFDA on the 
vertical axis. The shaded region is the region for which 
intra-coding is to be used. 

Note from FIG. 3A that the bias IB„ works against intra 
coding as Q„ increases. This is an important feature of the 65 
inventive inter/intra decision algorithm. The reason is as 
follows. If a macroblock is to be coded with a high quan- 



tization step size (as determined by the rate control algo- 
rithm and the visual content of the block), its representation 
when coded intra will be poor, while the inter representation 
will be better. 

D. Three Stage Hierarchial Motion Vector Determination 

In the case of an interlaced sequence, a three stage 
hierarchical motion estimation algorithm is utilized to obtain 
a motion vector. The first two stages are performed while the 
previous frame is being encoded. 

A motion vector is a two dimensional vector used for 
motion compensation that provides an offset from the coor- 
dinate position in the current frame or field to the coordi- 
nates in a reference frame or reference field. 

Consider a macroblock in a frame F to be encoded. This 
macroblock has two fields designated F x and F 2 . Consider a 
reference frame P from which a predictive macroblock is to 
be obtained. The reference frame has two fields designated 
P 1 and P 2 . The fields F x , F 2 , P x , P 2 are shown in FIG. 4A. 
There are four possible field-based motion vectors, P A — ^Fj, 
P 2 — >F 2 , P X ^F 2 , P 2 -^F 1 and one possible frame-based 
motion vector. The field based motion vectors are also 
shown in FIG. 4A. 

The three stage hierarchy for obtaining the motion vector 
is explained in connection with the flowchart FIG. 4B. 

In the first stage of the three stage hierarchy, the macrob- 
lock to be encoded is decimated by a factor of four hori- 
zontally and by a factor of four vertically (step 100 of FIG. 
4B). This results in a macroblock which comprises 4x4 
pixels, so that there is a 4x2 macroblock in Field F x and 4x2 
macroblock in field F 2 . A search area in the frame P is 
similarly decimated. 

Typically, a 4x4 decimated frame macroblock and a 4x2 
decimated field macroblock are sufficiently small targets that 
a block matching algorithm for finding a best match within 
a search area may not work properly. Accordingly, to obtain 
a motion vector in this first stage of the three stage hierarchy, 
a pair of adjacent 4x4 decimated frame macroblocks are 
grouped together to form a target for obtaining a frame based 
motion vector. Similarly, four 4x2 decimated field macrob- 
locks are grouped together (into an 8x4 macroblock set) to 
form a target for obtaining the field based motion vectors. 

Then the four possible field-based motion vectors and the 
frame-based motion vector are determined for each deci- 
mated macroblock and decimated search area (step 102). 
Numerous block matching techniques are available for find- 
ing a motion vector between a macroblock (or set of 
macroblocks) of pixels in a current frame or field and a 
search area in a previous frame or field. The frame-based 
motion vector and field-based motion vector resulting in the 
best prediction are retained (step 104). 

In the second stage of the hierarchy, the macroblock to be 
encoded is decimated by a factor of two in the horizontal 
direction and a factor of two in the vertical direction (step 
106). This results in an 8x8 macroblock comprising an 8x4 
macroblock in Field F x and an 8x4 macroblock in Field F 2 . 
The search area in the reference frame is also decimated by 
a factor of two horizontally and a factor of two vertically 
(step 106). The selected field-based motion vector and the 
frame-based motion vector are refined (step 108). The one of 
the two refined motion vectors which results in the best 
prediction is then selected (step 110). This determines which 
mode is used to obtain the final motion vector for the 
macroblock. The mode will either be a frame-based motion 
vector or the selected one of the four possible field-based 
motion vectors, depending on the results of stage 1 and stage 
2. 

In the third stage of the hierarchy, the motion vector is 
obtained using the selected mode (step 112). The full reso- 
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lution macroblock and full resolution search area in the 
reference frame are utilized- This motion vector is then 
refined to half -pixel resolution. 

It should be noted that it might be necessary to compute 
several motion vectors for a macroblock to be encoded using 5 
this procedure. For example, in the case of a macroblock in 
a B frame, it may be necessary to determine a motion vector 
for forward prediction, a motion vector for backward pre- 
diction and an interpolated motion vector. The decision to 
use forward prediction, backward prediction or interpolated 10 
prediction is made based on the motion vectors obtained 
after stage 3 of the hierarchy. 

E. Field/Frame Macroblock Encoding Decision 

For a frame in an interlaced sequence, each field may be 
coded separately or frame-based coding may be utilized. It 15 
may be preferable to use frame based encoding because it is 
more flexible. 

If frame-based encoding is chosen, for each macroblock 
of pixels (in the case of an I macroblock) or each macrob- 
lock of predictive errors (in the case of P or B macroblocks) 20 
the encoding (DCT, quantization, zig-zag scanning, run- 
level pair encoding, variable length encoding) may be field- 
based or frame-based. 

For the macroblock n, the following is deterrnined: 

25 

iff X jh< £ Mk+X Jr2 k ) 
\ bj&i btfin b&n / 

use frame encoding, otherwise use field encoding for the 
macroblock n. 30 

F. Scene Change Detection 

In the inventive encoding method, scene changes are 
detected by the controller in the master unit. Scene changes 
are detected as follows: 

1. The controller in the master unit maintains an average 
total activity TA Z for the frames of a scene. 

2. The controller in the master unit maintains a deviation 
from average total activity of TA X for the frames in a 
scene. 

3. The controller in the master unit maintains an average 
motion estimation score for the frames in a scene and 
a deviation from average. The motion estimation score 
of a macroblock is, for example, a sum of the absolute 
differences of the pixels in the macroblock and the 45 
corresponding pixels in the prediction of the macrob- 
lock in the reference frame. The motion estimation 
score of a frame is the sum of the motion estimation 
scores of the macroblocks in the frame. The predication 
may be the prediction obtained in stage one of the three 50 
stage motion vector determination. 

4. If the deviation from average of TA, in a particular 
frame and the deviation from average of the motion 
estimation score in the particular frame exceed their 
expected deviations by a threshold factor (e.g., a factor 55 
of 10), a scene change is detected. 

When a scene change is detected the following steps are 
taken: 

1. A new GOP is started. 

2. Default values are used to establish budgets for I, P and 60 
B frames rather than basing the budgets on the previ- 
ously coded frame of the same type (see Section H 
below). The budget are established so that the budget 
for an I-frame is twice the budget for a P frame and four 
time the budget for a B frame. The budget for the I 65 
frame is determined from factors such as intra-activity 
and VBV occupancy. 
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3. After the second reference frame of the new GOP is 
encoded (usually a P frame), rate control variables are 
scaled using the first frame result. 

4. The average of TA f is set to the value of TA, for this first 
frame. The TA, variance is set to difference between the 
value of TA t of this first frame and the average of the 
previous scene. Similarly, the average of motion esti- 
mation score is set to the motion estimation score for 
this first frame. The motion estimation variance is set to 
the difference between the motion estimation score of 
this first frame and the average of the previous scene. 

5. The rate control quantization scale factor for an 
initial macroblock in the first frame of each type (I, P, 
B) in the new scene is a function of TA, and the bit 
budget for the frame, i.e., Q/=f(bit budget, TA;). The 
values of this function are stored in a table and may be 
accessed by the master and slave controllers. In 
general, this rate control quantization scale factor var- 
ies inversely with the bit budget and directly with the 
total frame activity. Sample, values of the function f(bit 
budget, TAj) are plotted in FIG. 5. Once an initial value 
of Q^* is deterrnined for IP and B frames according to 
f(bit budget, TA ( ), it is possible to determine initial 
occupancies for the corresponding virtual buffers v^v^, 
v B . These occupancies are then updated to obtain 
subsequent rate control quantization scale factors. 

G. Fade Detection 

A fade takes place when the DC value of the luminance 
of a scene varies smoothly with frame number from a 
particular value until the DC luminance value for black is 
reached or the DC luminance value for white is reached. 

FIG. 6 illustrates the DC luminance value as a function of 
frame number for a fade to white and for a fade to black. 

The preprocessor compares the DC values of successive 
frames to detect a fade. (Instead of the DC value, activity 
measures for successive frames may be compared to detect 
the fade). 

The preprocessor sends an indication of a fade to the 
controller in the master unit of the encoder. The controller 
uses the information to determine the sequence of I, P. and 
B frames. In general, B frames do not work well in the 
presence of a fade and result in a large number of bits. In 
general, P frames with motion vectors at or near zero are 
utilized or else I frames are utilized. Preferably, a reduced 
search range is used to find the motion vector. In the case a 
B frame is utilized, instead a receiving the normal budget 
BB B for a B frame, the frame receives BB^> the normal 
budget for a P frame (see Section H below). 

H. Rate Control 

H.l Bit Budget Determination 

Let R be the channel bit rate (bits/second) as specified in 
the sequence header, and f be the nominal frame rate 
(frames/second) specified in the sequence header. The effec- 
tive coding rate R^ will vary over time from R when 
statistical multiplexing causes a plurality of encoders to 
share the channel rate R and/or variable bit rate (VBR) 
coding is used. Similarly, the effective frame rate f^, may 
vary from the nominal frame rate f such as under inverse 
telecine processing. The value f^ is considered to be a 
windowed time average. 

Let K 7 , Kp, and K B be the number of intra, predicted, and 
bidirectional frames per second (Kj+Kp+K^f^). These 
numbers depend on the group of pictures (GOP) structure 
and coding bit rate, and therefore vary with f^ or changing 
GOP structure. 

Let S; and "Qj be the number of bits and average quanti- 
zation scale factor used for coding the most recent frame of 
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type j. Then BB^ the bit budget for frame i is given by 



BB, = (ft • Qj • R#y ( jJL p b Kj Sj Q ) 



(1) 



Constant bit rate (CBR) rate control is achieved through 
a feedback mechanism by monitoring bit spending and 
appropriately adjusting the quantization stepsize. For P and 
B pictures, the frame budget BB, is first distributed among 
the sections such that section k gets a bit budget BB/* 0 : 



BBf> = Cf> BBJ £ dp 
M>,M 



(2) 



(3) 
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where M is the number of sections, and C/*° is the 
complexity measure for section k for frame type i, computed 15 
as follows: 



where is the number of macroblocks coded by section 20 
k, Q n R is the rate control quantization scale factor for 
macroblock n, and s„ is the number of bits used for coding 
macroblock n. Note that the complexity measure used in 
Equation (2) and Equation corresponds to the most recent 
frame of the same type. 25 

Each slave unit (see FIG. 1) then distributes its budget 
amount its macroblocks proportional to the total activity 
measure ta rt of each macroblock n. In intra pictures, and in 
P and B pictures following a scene change, the bit budget is 
distributed among sections proportional to the total intra 30 
activity of each section. The total activity of a section is a 
sum of the total activities of the macroblocks in the section. 

H.2 Virtual Buffers for Determining Quantization Level 

Virtual buffers v 7 , and v^ (corresponding to frame 
types I, B, and P) provide rate control feedback by adjusting 35 
the rate control quantization scale factor. The virtual buffers 
provide a mechanism to relate the number of bits allocated 
to a macroblock and the quantization scale factor selected 
for that macroblock. The rate control quantization scale 
factor for a macroblock n in a frame of type j is 40 
determined by the ratio of virtual buffer fullness (denoted 
vbf}) to virtual buffer size (denoted vbs^) in the following 
way: 

Q»=QSnAvhf/vbsJ) (4) 45 

where ^x^Q^* is a smooth function that returns the 
minimum rate control quantization scale factor value Qj* 
for x^O, and the maximum rate control quantization scale 
factor value Q„* for x^ 1. The function f(x) is implemented 50 
as a table with 256 entries. Two such tables, one for linear 
(i.e., MPEG-1), and for non-linear quantization may be 
provided. The QS„ factor is an additional quantization scale 
factor based on the masking activity (see Section H.6 
below.) 55 

H.3 Video Buffer Verifier (VBV) 

All MPEG-2 bit streams are required to comply with the 
Video Buffer Verifier (VBV) rules described in the MPEG-2 
standard The VBV buffer is a virtual buffer which is 
maintained by the controller in the master unit. The VBV 60 
buffer is a model of the input buffer at the decoder. The 
encoder allocates bits to frames such that the VBV buffer 
does not overflow or underflow in the case of constant bit 
rate encoding (In the case of variable bit rate encoding, it is 
only necessary to prevent VBV underflow). The MPEG-2 65 
standard specifies the manner in which bits are placed into 
and removed from the VBV buffer. 
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Specifically, the removal of the bits from the VBV buffer 
is instantaneous (implying that the decoder decodes the 
picture instantaneously). In this respect, all realizable 
decoder buffers deviate from the hypothetical VBV decoder 
buffer. 

Apart from this instantaneous decoding model, the encod- 
er's physical buffer is a mirror image of the VBV buffer: a 
VBV underflow implies an encoder physical buffer 
overflow, i.e., too many bits generated at the encoder. 

The VBV has a receiving buffer of size B bits. The 
fullness of the VBV buffer at any time is denoted by d (or 
by d (i) after picture i has been removed from the buffer). The 
physical encoder buffer (with fullness denoted by P) differs 
from the VBV buffer in that it is filled at a rate similar to the 
VBV buffer, however after each macroblock n is encoded, 
the bits spent to encode the macroblock s n , are removed 
from the VBV buffer. Therefore, the two buffers have the 
same occupancy after a picture has been removed from the 
VBV buffer. 

H.4 VBV Underflow Protection 

There are two levels of actions taken to prevent VBV 
underflow. After the number of bits used by the previous 
picture is known, the bit budget for the current picture is 
reduced if necessary so as to fit in the VBV buffer, i.e. 



d+R^gr-BBX) 



(5) 



The second level of action takes place at the macroblock 
level, where the rate control initiates what is called the panic 
mode. 

During encoding, a scene can be encountered which 
generates too many bits, even when the quantization scale 
factor is set to the maximum value. The encoder is in danger 
of generating too many bits for the channel to transfer to the 
decoder and causing a "VBV underflow" bitstream error. In 
this case, the encoder enters the "panic" mode in which 
quality is sacrificed to guarantee a legal bit stream. 

The panic trigger mechanism is related to the encoder 
model of decoder buffer fullness (i.e. the VBV buffer). The 
encoder updates its model of decoder buffer fullness as each 
macroblock is encoded by subtracting the bits utilized to 
code the macroblock. The VBV is also updated by adding 
the bits transferred by the constant bit rate channel. 

At the last macroblock in a frame, the encoder needs to 
ensure that the number of bits generated for the rest of the 
frame is at or below a relatively small level to ensure that 
there will be no VBV underflow during the rest of the frame. 
At the beginning of the frame, the encoder needs to ensure 
that the actual number of bits generated is below a much 
larger level to prevent VBV buffer underflow during the rest 
of the frame. In the panic mode, to ensure that there is no 
VBV buffer underflow, all non-intra macroblocks are of type 
not coded, i.e. they have no coded block patterns. This 
means that the predictive error macroblocks are hot coded. 
Intra macroblocks have a DC and a small (programmable) 
number of AC discrete cosine transform coefficients. 
Furthermore, the intra-bias (see section C above) is 
increased to favor non-intra macroblocks. 

The condition as to when to start the panic mode is now 
more carefully considered. Let K be the number of macrob- 
locks in a picture, and let be the estimate of maximum 
number of bits generated per macroblock when in panic 
mode (this will be general depend on the current picture type 
and how many AC coefficients are retained for intra 
macroblocks). Let k be the number of macroblocks remain- 
ing in the current frame. Clearly, we need k-N^ bits 
remaining in the VBV buffer to be able to handle the worst 
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case. On the other hand, the VBV buffer is filled at a rate of 
R^^f^K) bits per macroblock. Therefore, the "optimal" 
strategy is to enter panic mode when the physical buffer has 
less than kCN^^-R^f^C)) bits with k macroblocks 
remaining to encode. This is implemented as follows: 5 

Prior to encoding, the physical buffer fullness P 0 is set to 
the VBV buffer fullness d. Then, P„ is updated for the 
macroblock in a frame as: 



(6) 



d+R^f^BB^B-R^ 



(7) 



where B is the size of the VBV. 

Zero stuffing is done by the master unit controller after a 
picture is coded. If 
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The panic mode is enabled as long as P n ^(K— n)-Nmax. 

As indicated above, the quantization step size is a product 
of a weighting matrix W[v][u] and a quantization scale 
factor Q n (known also as mquant in the MPEG-2 standard). 15 
There are 31 possible (Q„). (W[v][u]) matrices in accor- 
dance with MPEG-2. Illustratively, these matrices are all 
stored at the encoder. In addition, one or more panic matrices 
may be stored. For example, in the panic mode, if it is 
desirable to encode only the DC transform coefficient, the 20 
transform coefficients F[v][u] may be multiplied by an 
element from a panic matrix p0[v][u], whose only non-zero 
value is p o [0] [0]. If the panic is less severe, a different panic 
matrix p^vHu] may be utilized which has a few AC non- 
zero values in addition to the DC non-zero value. This 25 
permits the intrablocks to be coded with a DC coefficient and 
a few non-zero AC coefficients. In general, a plurality of 
panic matrices may be stored at the encoder with different 
numbers of non-zero AC coefficients. Depending on the 
severity of the panic, a particular panic matrix with a 30 
particular number of non-zero AC coefficients is selected. 

H.5 VBV Overflow Protection 

VBV overflow implies that the encoder is not generating 
enough bits to keep the physical encoder buffer from emp- 
tying. The solution is to produce the right number of stuffing 
bits. The stuffing can be done at the picture level (called zero 
stuffing), or at the macroblock level (called macroblock 
stuffing). MPEG-2 disallows macroblock stuffing. 

Similar to underflow protection, the action is taken at two 
levels. The bit budget BB^ of frame i is increased if neces- 40 
sary so that 
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Q n =(QS n )<Q n R ) 



(10) 



where Q n R is the rate control quantization scale factor. 

H. 7 Statistical Multiplexing and Rate Changes 

The rate control technique described above for allocating 
bits to frames depends on the effective bit rate R^ More 
specifically, the bit budget BB, provided for each frame 
depends on the effective bit rate R^ In addition, VBV 
enforcement to prevent VBV overflow or underflow depends 
on the rate R^ 

The rate R^ may change under a variety of circum- 
stances. 

For example, a plurality of constant bit rate encoders may 
share a channel. A statistical multiplexer controller allocates 
a portion of the channel bandwidth to each encoder. At 
various times, the portion of channel bandwidth allocated to 
a particular encoder may change. Alternatively, in a case of 
a variable bit rate encoder, the bit rate used to encode the bit 
stream may change from time to time. This rate change has 
to be accounted for in the rate control algorithm 
(a) Introduction A statistical multiplexing system consists of 
N variable bitrate streams sharing a channel with constant 
capacity R (called bundle rate). Let R^ denote the bitrate at 
which the stream i is being served at time t. Then, the 
following must hold for all t. 

The allocation of R among the N channels is done to 
minimize an overall cost function, and is dynamic in gen- 
eral. The following questions summarize the issues faced in 
the design of a statistical multiplexing system: 

I. What is the "best" distortion measure? (Distortion 
refers to the difference between a coded picture and the 
actual picture, it is the opposite of quality.) 

2. What is the optimal allocation of rates that minimizes 
the distortion measure? 

3. How does one achieve the optimal allocation given an 
initial allocation? 

It should be noted that the optimal allocation of bits 
depends on this cost function. Let d/t) be the distortion of 
stream i at time t Then two popular choices for the cost 
function are "minimum total distortion": and "minimax 
distortion". In the former case, one minimizes 



d>B-R cJ tf^-safety margin 



(8) 



then zero bytes are stuffed to bring the occupancy to this 
level. The safety margin is provided to take care of bit rate 
fluctuations, i.e. a mismatch between the encoder and chan- 
nel clocks. 

H.6 Adaptive Quantization 

The masking activity quantization scale factor for a 
macroblock i is given by 

QS^aima^ )4W(ra2„+oc(m<2, 3 + 1 fc 1 )) 



50 



55 



60 



where ma is the average masking activity over the entire 
picture and a is 4, and 

where kj^ is a constant to correct for the case where ma„ 
is small. Illustratively, ^ is 5% to 10% of ina. 65 
The final quantization scale factor Q n (also known as 
mquant) used to encode a macroblock is 



N 



which requires the locate rate- distortion functions for 
each stream to be known. In the latter case, the function 



J=maxd i 

has a much simpler solution, since it is clearly rnimmized 
when all distortions are equal. This cost function will be 
assumed in the following discussion, although any cost 
function may be accommodated in general. 

Section (b) focuses on the allowable range of rates for a 
given stream. Section (c) proposes a statistical multiplexing 
system that converges to the desired optimal allocation in 
several steps. 

Consider the end-to-end system illustrated in FIG. 7. The 
end-to-end system 200 comprises the encoder system 202 
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(including preprocessor, master unit and one or more slave 
units), a physical encoder buffer 204, a channel 206, a 
decoder input buffer 208, and a decoder 210. The decoder 
buffer 208 is modeled at the encoder by the VBV buffer, thus 
the decoder input buffer 208 is labeled VBV. 
(b) Constraints for Rate Change 

The rate change has to ensure the following: 

1. The decoder buffer cannot overflow or underflow. 

2. The encoder buffer cannot overflow. 

3. End-to-end system delay must be constant. 

Here we assume that the channel has some means (such 
as null packet insertion) to handle the case of encoder buffer 
underflow. When the encoder buffer is appropriately sized, 
the second condition is automatically satisfied when the 
end-to-end delay constraint is met. The proper size for the 
encoder buffer is A/R^^ where A is the desired end to end 
delay, and K^. TTUZX is the maximum allowable channel rate. 
Note that this is the abstract encoder buffer size, and the 
physical buffer has to be larger to allow for encoding delays, 
inverse 3:2 pulldown, etc. 

Let the channel rate at time t be R(t), the encoder buffer 
fullness E(t), and the decoder buffer fullness D(t). The 
end-to-end system delay A has two components: encoder 
buffer delay A e , and decoder buffer delay A d . The former is 
proportional to the encoder buffer occupancy, and inversely 
proportional to the bitrate, i.e., 



(b)(2) Instant Encoding, No Advance Notice 

Now let us consider a more realistic case, where the 
channel rate changes with no advance notice. What are the 
largest and smallest bitrates that can be accommodated in 
terms of encoder and decoder buffer fullness? 

First, assume that the new rate R^ a is greater than R^ 0 . 
Then, there exists the possibility that this unforeseen rate 
change will overflow the decoder buffer, and/or underflow 
the encoder buffer. 

At the time of the rate change, the encoder has E(T X ) bits, 
which would have taken do seconds to drain, where do=E 

(Ti)/Rq^i. 



10 



15 



20 
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The decoder buffer delay is of course A-A^(t), as it serves 
to remove the jitter introduced by the encoder buffer. Notice 
that a step change in the bitrate causes a step change in the 
encoder delay, and this has to be absorbed by the decoder 
buffer. 

(b)(1) The Perfect Case 

In this section, we assume an encoder that can encode a 
frame instantly, i.e., with no delay. Furthermore, we assume 
that the statistical multiplexing control announces the rate 
changes in advance. 

Assume the rate is R^ 0 for t<T A , it changes to R^ A . In 
that case, E(Ti) bits were put in the encoder buffer while the 40 
rate was R^ 0 - drained at a different rate, R^. If the 
encoder is notified about the rate change in advance, it can 
start outputting bits at the correct rate before the rate change 
occurs at the channel. 

How much advance notice does the encode require? It 45 
needs to be notified early enough that all the bits currently 
in the buffer can be emptied at the current rate. Let the 
encoder be notified 8 seconds before T 1? the time at which 
the channel rate changes. Then, the encoder buffer cannot 
contain more than R^ 0 8 bits, as this is the number of bits 50 
that will be drained before the rate change. Therefore, the 
advance notice is the smallest 5 that satisfies 
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At the new rate, it takes d 1 =E(T 1 )/R^ r a seconds to drain. 
This causes a step increase in the decoder buffer fullness by 
the amount of d 1 (R^ 1 —R^ ;o ). From then on, the encoder 
can adjust the coding bitrate to avoid a decoder buffer 
overflow. At the maximum allowable rate R^*, the 
decoder buffer reaches its maximum fullness. 



Or, the maximum rate change that can be tolerated by the 
decoder at time T 1 is no smaller than 

ff E(Ti) . R&o 



Notice that the R^i* we have computed is not necessar- 
ily the largest rate that can be accommodated. However, it is 
a lower bound for the largest allowable bitrate. This is 
because we have considered the worst case, where no bits 
are removed from the decoder buffer during time d x . In 
general, d x may be many picture periods, and the bits 
corresponding to these pictures will be removed from the 
decoder buffer as the pictures are decoded. Then, it is 
possible to calculate a tighter lower bound for R^i* by 
considering each picture decoding period until d 2 . 

Now, let us consider the case where the new rate R^,i is 
less than R^ 0 - Then, one or both of the following may take 
place: 

1. The encoder buffer may overflow, causing loss of data. 

2. The decoder buffer may underflow, thereby violating 
the end-to-end delay constraint. 

The first condition is avoidable by appropriately sizing the 
encoder buffer, and will not be considered. (In other words, 
the encoder buffer is sized such that the decoder buffer 
underflows before the encoder buffer overflows). 

Analogous to the previous analysis, let d 1 =E(T 1 )/R^ ri . It 
is clear that the end-to- end delay constraint will be violated 
if the decoder is allowed to underflow. In other words, when 
the decoder buffer underflows, the encoder delay is at least 
as large as the_ end-to-end system delay. Therefore, the 
minimum rate R^ is given by 



E(Ti) 



Note that the above equation can be arrived at by looking 
at the end-to-end delay constraint. Recall that the total delay 
is the sum of encoder buffer delay and decoder buffer delay. 
The rate change causes a step change in the encoder buffer 
delay. Then, the decoder buffer delay must be able to 
accommodate this change, i.e., is should be at least as large 
as the change in the encoder delay: 



A d = A — A e = change in A e 



A — 



A ^ 



Reff.O 



E(TQ 



E(Ti) 



Reff,Q 



(c) Statistical Multiplexing Control 

Given an optimal rate allocation among the streams, and 
rate constraints for each stream, how de we apply this to a 
statistical multiplexing system. We consider a typical sce- 
nario of N encoders communicating with a central control 
unit. Illustratively, consider that all encoders are frame 
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synchronized. This does not lead to a loss of generality* since 
the encoders can be frame synchronized by addition of 
appropriate amounts of delay. The presence of inverse 
telecine adds a field delay when a field is dropped, but this 
can also be handled in a similar fashion. 5 

Such a statistical multiplexing system is shown in FIG. 
7A. Aplurality of encoder systems 200-1, 200-2, . . . , 200-N 
are controlled by a central controller 210. Each encoder 
system 200 comprises an encoder 202 and an encoder buffer 
204. The internal structure of an encoder is described above 10 
in connection with FIG. 1A and FIG. IB. The central 
controller 210 may communicate with the local controller 
which is part of each encoder 202. The central controller 
may be implemented using a general purpose CPU or DSP 
(Digital Signal Processor) or implemented as a dedicated 15 
integrated circuit. The algorithms implemented by the dedi- 
cated central controller are described below in connection 
with FIGS. 7B and 7C. Each encoder 200 reports its status 
to the central encoder 210 after encoding a frame, and 
accepts any changes to its operation parameters (such as 20 
birr ate). 

The task of optimal rate allocation is shared among the 
central control unit and each encoder in the following way 
illustrated in FIG. 7B. 

After completing the encoding of the current frame, each 25 
encoder computes the upper and lower rate limits consider- 
ing its VBV buffer fullness (step 302 of FIG. 7B). In the 
meantime, the central control unit has already computed the 
desired rate allocation without any buffer constraints (step 
304 of FIG. 7B). This may take as long as a frame time. 30 
Then, for each encoder, the optimal rate is clamped to be 
within its allowable rate boundaries (step 306 of FIG. 7B). 
In general, as a result of this clamping, the sum of individual 
rates will not be equal to the bundle rate (i.e., the total 
channel rate R). 35 

Let' s consider the case where the sum of the adjusted rates 
is less than the bundle rate (this will occur if a large amount 
of bits are clamped at the upper limit of individual 
encoders). The simplest solution is for the central controller 
to set aside all encoders that have been clamped, and rescale 40 
the rest such that the sum of all the encoders is equal to the 
bundle rate (step 308 of FIG. 7B). Note that the rescaling 
may cause some encoders to fall outside their allowable 
range (upper range in this example). Therefore, this proce- 
dure needs to be iterated by the central controller until all the 45 
constraints are satisfied, and the sum of the individual rates 
is equal to the bundle rate (step 310 of FIG. 7B). 

This basic algorithm may be further improved. First, in 
the earlier example, it is suboptimal to set aside the encoders 
that have been clamped at their lower limits. That is because 50 
the rescaling may cause the desired rate for this encoder to 
rise above the clamped value. An intermediate solution may 
involve clamping those encoders that have a large difference 
between their lower limits and desired operating rates, and 
allow others to be rescaled. 55 

This method provides a good approximation to the opti- 
mal operation point. Now we need a method to make it 
possible to actually converge to the optimal point. This 
method is carried out by the central controller using the 
algorithm illustrated in FIG. 7C. 60 

Each encoder maintains the desired operating limits for its 
VBV buffer. For each encoder, the central control unit 
calculates these values and transfers them at every picture 
(step 320 of FIG. 7C). In particular, when the desired rate 
differs significantly from the current rate, the buffer fullness 65 
may lie outside the desired range. In this case, the central 
control unit may plan a trajectory for the rate and VBV 
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buffer fullness such that the desired rate may be achieved in 
a finite number of frames (step 322 of FIG. 7C). However, 
since the optimal rate allocation is a moving target and 
changes with the changing input statistics, this point may 
never be achieved in practice. The VBV buffer trajectory is 
the trajectory of VBV occupancy as a function of frame 
number. The rate trajectory of an encoder is the allocated 
rate as a function of frame number. 

The central control unit may get advance information 
about the changes in each encoders input. For example, in a 
typical set up, the frame activities may be available several 
frames ahead. This may allow sufficient time for a desired 
VBV buffer fullness to be reached before the rate change 
needs to take place. 

H.8 Inverse Telecine Processing 

In inverse telecine processing, the encoder detects and 
drops (does not encode) repeated fields which are present in 
the 3:2 pulldown sequence. 

The dropped fields affect the rate control algorithm. 

As the frame time is no longer constant (some frames 
have a duration of two field periods and some frames have 
a duration of three field periods), a weighted time average of 
frame periods is determined. Let T z be the number of fields 
in frame i. Thus, T f has the value two or the value three. 
Then, T f , the average number of fields per frame is 



(14) 



2" j=i- N * J 

where N is a number of frames. 

Let f be the input frame rate as specified in the sequence 
header. 

The effective frame rate f^ is given by 



(15) 



This average frame rate should be utilized in determining 
the budget BB, for each frame i. 
In particular, in Equation (1), 



Alternatively, equation (1) can be rewritten as 



BB t = (S&) (Ti/2) Regl. X KjSjQj 

J=1>P>B 



(16) 



(17) 



When a frame has a duration of three field periods, the 
VBV buffer is allowed to fill at the regular rate R^for three 
field periods. Therefore, the zero stuffing condition needs to 
be adjusted so that the VBV buffer does not overflow during 
this extra field period. Equation (8) is modified so that 



dt-i>B-(2/f) (/e^/T^afety margin 



(18) 



where d^ is the VBV occupancy after bits corresponding 
to picture i— 1 have been removed from the buffer and T^ is 
the number of fields displayed for the following picture (in 
encode order). 

L Rate Control for Variable Bit Rate Controller 

The rate control algorithm described above may be modi- 
fied for use with a variable bit rate, non-real time encoder. 
The variable bit rate non-real time encoder is especially 
useful for generating bit streams to be stored on digital video 
disks (DVDs). 

A variable bit rate, non-real time, multi-pass encoding 
algorithm is described below in connection with FIG. 8. 
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L-l 

/=0 



represents the number of discretionary bits from the first 
pass. 

The number of bits B £ used to code each frame in the first 
pass is then scaled (step 204 of FIG. 8) by a scale factor a, 
where 



^Target 



a = 



L-l 

i=0 



L-l 

B Target ~~ Zi 

and 

Bl = Zs + a(B { - Zi) 
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This algorithm proceeds as follows: 

1. The sequence of input frames is coded in a first coding 
pass. In the first encoding pass (step 200 of FIG. 8), VBV 
enforcement is disabled. Moreover, the rate control quanti- 
zation scale factor is fixed. The rate control quantization 
scale factor is not a function of virtual buffer fullness. The 
fixed rate control quantization scale factor is multiplied by 
a variable masking activity quantization scale factor for each 
macroblock in a frame. 

The result of the first pass is that a number of bits B, is 
determined for each frame i in the input sequence of frames 
to be encoded (step 202 of FIG. 8). 

2. The result of the first pass encode is a number of bits 
B f used to encode each frame i (step 202 of FIG. 8) The total 
number of bits from the first pass is 



L-l 

Bjotal- 2 Bi 

i-0 



where L is the total number of frames in the input sequence. 2Q 

Suppose the desired target is B Target bits in L frame. Let 
Z, represent the unavoidable total overhead bits for frame i. 
The number Z, in general depends on the frame type. Thus, 
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If B/ were to be used as the bit budget for each frame i, 
the constraint of not exceeding the target number of bits 
^Target * s satisfied. In addition, a desired average bit rate is 
also met by the bit budgets B/ However, there is no 
assurance that the bit budgets B/ will not exceed a maximum 
bit rate R,,^ If an encoder generates bits at a rate that 45 
exceeds R,^, it will lead to an underflow of the decoder 
buffer (and the VBV) because more bits are being generated 
than can be transferred to the decoder buffer, 

3. To solve this problem, the encoder generates the VBV 
trajectory which would result from the bit budgets of B/ 50 
(step 206 of FIG. 8). Such a trajectory is shown in FIG. 8A. 
FIG. 8A plots VBV occupancy as a function of frame 
number. OVR designates the maximum VBV capacity. 
UNDR indicates the minimum VBV capacity. Thus, when 
the trajectory exceeds OVR, there is a VBV overflow. When 55 
the trajectory falls below UNDR, there is a VBV underflow. 

No steps are taken to prevent VBV overflow. Rather, 
when a DVD is being read, and the VBV occupancy reaches 
OVR, no further bits are taken from the disk player. Thus, 
the VBV trajectory is clipped. As indicated above, VBV 60 
overflows are permitted for a VBR encoder according to the 
MPEG-2 standard. 

The following steps are utilized to correct the budgets B f * 
to eliminate VBV underflows and thereby ensure that a 
maximum channel rate R majc is not exceeded. 65 

(a) find the intervals [a,b] between OVR crossings in 
which there is VBV underflow 
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(b) for each interval [a,b] between OVR crossings, find 
the imnimum VBV occupancy and the undershoot 
D [o hl where D [a fe] is the amount of bits the VBV 
occupancy is below UNDR at the minimum (step 208 
of FIG. 8) 

The total number of bits allocated to the interval f a,b] 
is 



i—b 
v=a 



Bjotal 



\oJb) 



Generate the bit budgets BB, for each from according to 



BB^B/ outside the intervals [a, b] 

BBi ^ — Bi inside each interval [o,£>] 

The result of this scaling is indicated by the dotted line in 
FIG. 8A and corresponds to step 210 in FIG. 8. 

(d) To improve the performance of this algorithm, the 
encoder can generate a new VBV trajectory using the 
bit budget BB £ . If there are still VBV underflows, the 
steps a, b, c are repeated (step 212 of FIG. 8) 
4. The sequence of input frames is now encoded again in 
a second coding pass (step 214 of FIG. 8). In the second 
pass, the rate control quantization scale factor is not fixed 
and is free to vary with virtual buffer fullness. There is no 
VBV enforcement during the second coding pass. 

However, a feedback mechanism is used to account for 
deviations between BIT;, the actual number of bits used to 
code frame i and BB^, the bit budget for frame i. The 
cumulative frame budget deviation CE, is updated as fol- 
lows: 



CE^CE^+BUr-BBi 

The modified bit budget BB /+1 is computed as: 



where 5 fe is a constant, e.g. 0.1. 

An additional feedback mechanism may be used within a 
frame or picture. Let BB i?B be the number of bits budgeted 
for macroblock n in frame i. Let BU i n be the actual number 
of bits used to encode the macroblock n in frame i. The 
cumulative macroblock deviation in frame i is given by 



CEiji-CE^n-x+BU ^-BB^ 

The modified macroblock budget for the macroblock (n+1) 
within frame i is given by 



BB^n+D "^BB^n+^+SbnCEijt 

where d bn is a constant (e.g. 0.1) 

The encoded output stream of the second pass is now 
transmitted via a channel to a storage medium such as a 
digital video disk 

The above described two-pass variable bit rate encoding 
technique may be modified by adding a zero pass (i.e. a pass 
prior to the first pass). In the zero pass each field is compared 
pixel by pixel to the prior field f^ and the next prior field 
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f,_ 2 - A Mean Square Error or a Mean Absolute Error is 
obtained in each case. Using this information and possibly 
other statistics the encoder performs inverse telecine 
processing, scene change detection, and fade detection. In 
the zero pass, total activity and average intensity may be 5 
determined for the macroblocks in a field. In addition, the 
first two stages of the three stage motion estimation hierar- 
chy may be performed in the zero pass. This enables certain 
motion estimation statistics to be obtained which are useful 
for scene change detection. io 

It should be noted that the variable bit rate encoding 
technique described above is applicable to a sequence of 
field pictures as well as a sequence of frame pictures (which 
is the example provided above). 

J. Overall Encoder Rate Control Implementation 15 

The overall encoder implementation may be viewed as a 
pipeline in which several functions are implemented. The 
pipeline is illustrated in FIG. 9 and is described below. In the 
discussion, it is useful to distinguish between a previous 
frame whose encoding is complete, a current frame which is 20 
currently being encoded, and a next frame whose encoding 
has not yet started. 

The pipeline functions are as follows: 

init (Step 500 of FIG. 9) All rate control parameters are 
initialized to default startup values according to the con- 25 
figuration options selected at run-time (such as choice of 
linear versus non-linear rate control quantization scale 
factor). This function is executed only at startup (or restart). 

pre„pic (Step 510 of FIG. 9) At this point, the master has 
finished encoding, and slaves are still busy encoding the 30 
current frame. This function finishes all the calculations 
related to the previously encoded picture, and prepares for 
the next picture. 

1. Compute using VBV feedback. Since the number 

of bits used by the current frame is not available, the 35 
previous VBV fullness is used. 

2. Process the next frame's activities to obtain masking 
activity quantization scale factor and estimated bit 
budget per macroblock. 

3. Using an observation window of several frames, detect 
whether there is a fade. Average field brightness and 
field activities are used for this detection. If there is a 
fade, then constrain the motion estimation to a small 
range. Once set, the fade detected flag remains on for 
several frames. 

4. If there is no fade, then test to see if there is a scene 
change. This way, a fast cross-fade is not classified as 
a scene change. 

5. Update the statistics related to the previous frame that 50 
are not critical for the current frame's budget. Note that 
the current frame is being encoded at this time, so this 
calculation is delayed to reduce the real-time critical 
path. 

compute__budget (Step 520 of FIG. 9) At this point, the 55 
slaves are still busy and the VBV fullness is not known. 
Therefore, the budget computed here is subject to further 
change. 

1. If a scene change was detected, then calculate the new 
budget using the VBV fullness and coding rate. The 60 
VBV fullness is only an estimate, since the exact 
fullness is not known at this stage. Adjust the virtual 
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buffer fullness according to the bit budget. Use default 
values for P and B frame budgets. 

2. Else, if a fade was detected, then increase the B frame 
budgets (for example by 20%). Also make sure that B 
frame virtual buffer is not more full than P frame virtual 
buffer. 

3. Else, use the default formula to compute the bit budget. 
post_pic (Step 530 of FIG. 9) At this point, slaves report 

their status. 

1. Update the virtual buffer corresponding to the previous 
frame. 

2. Update VBV fullness. 

3. If necessary, insert zero stuffing. 

4. Adjust the previously computed bit budget if necessary. 
If the now completed current frame took more bits than 
planned, it may be necessary to reduce the next frame 
budget. 

5. Report the bit budget to the slaves, and start encoding 
the next frame. 

Conclusion 

A rate control algorithm for an MPEG-2 compliant digital 
video encodes has been disclosed. The rate control algorithm 
has embodiments which may be utilized for constant bit rate 
and variable bit rate encoding. 

Finally, the above described embodiments of the inven- 
tion are intended to be illustrative only. Numerous alterna- 
tive embodiments may be devised by those skilled in the art 
without departing from the spirit and scope of the following 
claims. 

We claim: 

1. In an encoder for encoding an input sequence of 
interlaced video frames to generate a compressed bitstream, 
a method for generating a quantization scale factor for 
coding a macroblock in a frame, said method comprising the 
steps of: 

(1) maintaining by said encoder first, second and third 
virtual buffers for I, P and B frames in said sequence of 
frames, 

(2) determining by said encoder for a macroblock n in a 
frame, a rate control quantization scale factor Q^, said 
rate control quantization scale factor being determined 
as a function of a ratio of virtual buffer fullness to 
virtual buffer size for the virtual buffer of the frame to 
which the macroblock belongs, 

(3) multiplying the rate control quantization scale factor 
for the macroblock by a masking activity quantization 
scale factor QS„ given by 



(a(ma„+^ 1 )+m«4»ttz rt +a(m£2 re +A: 1 )) 

where ma„ is a masking activity for the macroblock, 

where ma is a masking activity for the entire picture, 

where a and k x are constants, 

(4) coding said macroblock in said encoder using said 
final quantization scale factor to generate a portion of 
said compressed bitstream. 
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